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Abstract. Finite-state tree automata are a well studied formalism for representing term lan- 
guages. This paper studies the problem of determining the regularity of the set of instances of a finite 
set of terms with variables, where each variable is restricted to instantiations of a regular set given 
by a tree automaton. The problem was recently proved decidable, but with an unknown complexity. 
Here, the exact complexity of the problem is determined by proving EXPTIME-completeness. The 
main contribution is a new, exponential time algorithm that performs various exponential transfor- 
mations on the involved terms and tree automata, and decides regularity by analyzing formulas over 
inequality and height predicates. 
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1. Introduction. Finite representations of infinite sets of terms are useful in 
many areas of computer science. The clioice of formalism for this purpose depends on 
its expressiveness, but also on its computational properties. Finite-state tree automata 
(TA) [SI [2] are a well studied formalism for representing term languages, due to their 
good computational and expressiveness properties. They characterize the "regular 
term languages" , a classical concept used, e.g., to describe the parse trees of a context- 
free grammar or the well- formed terms over a sorted signature [12j , to characterize the 
solutions of formulas in monadic second-order logic [4J , and to naturally capture type 
formalisms for tree-structured XML data [131 H] ■ Similar to the case of regular sets of 
words, regular term languages have numerous convenient properties such as closure 
under Boolean operations (intersection, union, negation), decidable properties such 
as finiteness and inclusion, and they are characterized by many different formalisms 
such as regular grammars, regular term expressions, congruence classes of finite index, 
deterministic bottom-up TA, nondeterministic top-down TA, or sentences of monadic 
second-order logic [2]. Deterministic TA, for instance, can be effectively minimized 
and give rise to efficient parsing. 

When the used formalism for representing an infinite set of terms is not a TA, it is 
often expedient to decide whether the represented set is in fact regular. A simple and 
natural way of describing an infinite set of terms, is through the use of "patterns" . A 
pattern is a term with variables; it describes all terms obtained by replacing the vari- 
ables by (variable-free) terms; see, e.g., [TT1[TD], and the references given there. Term 
patterns are used for pattern matching in most modern programming languages, and 
were already present in very early languages such as LISP. They are a central concept 
in compiling, natural language processing, automated deduction, term rewriting, etc. 
In some of these applications, variables in patterns are restricted to be replaced by 
terms in a regular language. E.g. in a programming language with regular types 
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(see, for instance, [HI [S]), variable instances might be constrained to regular term 
languages. Typically, term patterns in a programming language must be linear (i.e., 
every variable occurs at most once) in order to guarantee that the resulting type is 
regular. Our result shows that even if non-linear patterns are allowed (which is the 
case in logic programming languages such as Prolog), one can statically determine 
regularity, i.e., the existence of an exact regular type, in exponential time. 

More precisely, we consider the problem of determining the regularity of the set 
of instances of a set of terms with regular constraints, which we abbreviate as the 
"RITRC" problem. A particular case of this problem, in which variables can be 
replaced by arbitrary terms (without variables), was considered in [11 and shown to 
be coNP-complete (cf. also 10 ). The general RITRC problem was recently proved 
decidable [7]. The complexity of their decision procedure was left open in [J, but 
can easily be seen to exceed exponential time. Moreover, their solution is based on 
a rather general result of [3] about first-order formulas with regular constraints, for 
which the complexity is not known. 

In this paper, we determine the complexity of the RITRC problem by proving 
that it is EXPTIME-complete. At the beginning of Section [3] we show that the 
RITRC problem is EXPTIME-hard. This is done via a straightforward reduction 
from the finite intersection emptiness problem for tree automata. The remaining 
part of Section [3] describes an EXPTIME algorithm solving the problem, starting 
with an overview of it in Section 13.11 In summary, the algorithm first changes the 
regular constraints from several TA to one single tree automaton (of exponential size) 
with special properties. It then picks a non-linear term s from the given set S of 
terms, and checks the "infinite instances property of s in S"' : are there infinitely 
many instantiations of a non-linear variable x in s, which are not instances of S* — {s} 
(under the regular constraints)? If the infinite instances property holds for some s 
in S, then our algorithm stops and we know that the set of terms represented by 
S (under the regular constraints) is not regular. Otherwise, we can replace s by a 
new term s' that is linear in the variables, i.e., which does not contain duplicated 
variables. Roughly speaking, our algorithm then starts over again, with the new set 
{S — {s}) U {s'}. In this way, the algorithm will construct a set S' of terms in which 
all terms are linear in the variables, if and only if the represented set is regular. To 
check the infinite instances property of s in S, we instantiate the term s at all non- 
variable positions of terms in S"— {s}, and then formulate inequality constraints of the 
resulting terms with terms of — {s}. It is a non-trivial task to efficiently solve such 
inequality constraints. In fact, in order to solve systems of such inequality constraints 
in EXPTIME, it was a crucial step for us to introduce additional height constraints 
on the variables of the inequality constraints. The final formula F over height and 
inequality predicates characterizes all instances of s that are not instances of terms in 
S — {s}. Our algorithm solves the RITRC problem in exponential time by iteratively 
constructing and solving such formulas F. 

2. Preliminaries. The size of a set S is denoted by A signature consists of 
an alphabet S, i.e., a finite set of symbols, together with a mapping that assigns to 
each symbol in E a natural number, its arity. We write S^*^-* to denote the subset of 
symbols in E that are of arity k, and we write /^^^^ to denote that / is a symbol of arity 
k. The set of all terms over S is denoted Ts and is inductively defined as the smallest 
set T such that for every / S 'E^^^ k > 0, and ti, . . . ,tk &T, the term /(ti, . . . ,tk) is 
in T. For a term of the form a() we simply write a. For instance, if S = {f^^\ a^"^} 
then Ts is the set of all terms that represent binary trees with internal nodes labeled 
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/ and leaves labeled a. We fix the set X — {xi, X2, ■ ■ ■} oi variables, i.e., any set V of 
variables is always assumed to be a subset of X. The set of terms over S with variables 
in X, denoted Ts(X), is the set of terms over EUX where every symbol in X has arity 
zero. By Vars(s) we denote the set of variables that occur in s. By \t\ we denote the size 
of t, defined recursively as \f{ti, . . . ,tk)\ ^ l + \ti\ + . . . + \tk\ for each / e fc > 
and ti, . . . ,tk G Ts, and |a;| = 1 for each x in X. By height(i) we denote the height 
of t, defined recursively as height(/(ti, . . . , tk)) 1 + max(height(ti), . . . , height(tfe)) 
for each / e k > I and h,. . .,tk £ Ts, hcight(a) = for each a e T,^°\ and 

height(a;) — for each x € X. Given a term f{ti, . . . ,tk) G Ts, its set of positions 
Pos(i) equals {e} Ui<i<fc {i.p \ p G Pos(ti)}. Here, e denotes the root node, and p.i 
denotes the ith child of position p. The subterm of t at position p is denoted by t/p, 
and the symbol of t at position p is denoted by t[p]; we say that p is labeled by t[p]. 
For instance, for s — g{f{a,b),c), s/1 equals f{a,b) and position 1.2 is labeled by 
b. For a set r, we use Posr(i) to denote the set of positions of t that are labeled by 
symbols in F. In particular, we define for t G T^{X) the sets PoSv(i) and PoSnv(i) 
of variable positions and non- variables positions as Posx(i) and Pos(t) — Posx(i), 
respectively. E.g., for s as above, Pos{c}(s) = {2} and PoSv(s) = 0- When a position 
p is of the form pi.p2, we say that pi is a prefix of p. For a set of positions P, we 
denote by Prefixes(P) the set {p \ 3p' : p.p' £ P}. For terms s,t and p G Pos(s), we 
denote by s[p <— t] the result of replacing the subterm at position p in s by the term 
t. For instance, /(/(a, a), a)[l ^ a] = f(a,a). 

A (deterministic) tree automaton (over S), DTA for short, is a tuple A = 
(Q,P, I],(5} where Q is a finite set of states, P C Q is the set of accepting states, 
S is a signature, and (5 is a set of transitions of the form f{qi, . . . , qk) —>■ q, where 
/ G fc > 0, and q,qi, . . . ,qk G Q. Moreover, for each / G S and each 

qi, . . . ,qk G Q there exists at most one (and at least one if the automaton is com- 
plete) q such that /(gi, . . . , qk) ^ g is in 5. The language L(A) recognized by A is 
the set G Ts I A{t) G P} where A{t) is recursively defined as . . . ,tfc)) = q 

if / G k > 0, ^1, . . . , ifc G Ts, f{qi, . . . , qk) ^ q is a transition in S, and, for 

each i G {!,... ,fc}, qi = A{ti). Note that, when A is not complete, A{t) might be 
undefined. We also define, for q Q Q, the set L{A, g) = G Ts | A{t) = q} of terms 
for which A arrives to state q. Note that L{A, q) nL(A, q') — for all q / q' . We also 
extend A{t) to terms t in Teuq by assuming that the states q G Q have arity and 
A{q) = q for each g G Q. A set of terms P C Pj] is regular if there exists a DTA A 
such that L = L{A). The size |t| of a transition t = {f{qi, . . . ,qk) q) is fc + 2 and 
the size \ A\ of A is \Q\ + J^tgs I'''!- 

Given a DTA, it is decidable whether its recognized language is (i) empty, (ii) 
finite, or (iii) has cardinality fc, for a given fc. The corresponding constructions all run 
in polynomial time and are straightforward generalizations of the ones for classical 
finite (word) automata; proofs can be found in Theorems 1.7.4, 1.7.6, and 1.7.10 
of [2]. The following computational problems, together with the running times, are a 
consequence of the same proofs. 

Lemma 2.1. Let A — ((5,P, I],(5) be a DTA and k a natural number. Each of 
the following sets can be computed in polynomial time: non-emptyStates(A) :~ {q G 
Q I L{A,q) ^ 0} in 0(101 + 1^1), infiniteStates(A) {q e Q \ |L(A,q)| = oo} m 
0{\Q\ ■ \6\), and countUpto(A, fc) {{q, min{\ L{A, q)\,k))\qeQ} in 0{\Q\ - {SI). 

Sets of Terms with Regular Constraints Let V C X he a finite set of 
variables and S a signature. A regular constraint {over V and S) is a mapping 
M that associates to every a; G a DTA over S. A solution of M is a mapping 
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if : V Ty: such that, for each x & V, (p{x) e L{M{x)). A set of terms with regular 
constraints (over V and S) is a pair {S,M) where 5* is a finite subset of T-^(V) and 
M is a regular constraint over V and S. The language L{{S, M)) of {S, M) is defined 
as {t I 3(/?, s : [t = (p{s) AseS'At/jisa solution of M)}. A term in L{{S, M)) is also 
called an instance of (5, M) . 

The following result is due to [TT], cf. also p!U] . 

Proposition 2.2. Let y C X, S* a /imie subset ofT^{V), and M the regular 
constraint that maps every x £ V to the trivial DTA that recognizes T^. Regularity 
ofL{{S,M}) is coNP-complete. 

When analyzing complexity, with 11511 we refer to the sum of sizes of all terms in 
S, and with ||M|| we refer to the sum of sizes of all DTA in the image of M. With \S\ 
and \M\ we refer, as usual, to the number of elements in the sets S and M (i.e. number 
of pairs of the set defining the mapping M). We also do the following assumption in 
order to ease the complexity analysis. 

Assumption: The maximum arity of a function symbol in S is 2. It is well 
known that any arbitrary tree can be coded as a binary tree of essentially the same 
size. Usual such codings (such as the one taking first-child to left-child and next- 
sibling to right-child) preserve regularity of sets of terms (see, e.g.. Section 8.3.1 in 
[2]); moreover, it can be seen easily that the transformation of the regular constraints 
into this new binary signature produces an at most quadratic size increase. 

3. Regularity of the instances of a set of terms with regular constraints. 

Let (5*, M) be a set of terms with regular constraints. The ^''regularity of the instances 
of a set of terms with regular constraints problem^^ , RITRC for short, asks whether or 
not the set L{{S,M)) is regular. We know, by Proposition [121 that RITRC is coNP- 
complete in the particular case that M maps each variable to a DTA that accepts all 
terms. In general, i.e., with regular constraints, decidability of RITRC was proved 
in [7]; however, the complexity remained open. The algorithm of [7] does not run in 
exponential time, and in fact it has a far worse complexity. In this section we show 
that RITRC is EXPTIME-complete. We start with the easy part by showing that 
RITRC is EXPTIME-hard. 

Theorem 3.1. RITRC ts EXPTIME-hard. 

Proof Let S be a signature with S^^) ^ and let Ai, . . . , An be DTAs over S. It 
is weh known that testing whether L{Ai) n • • • n L(A„) = is EXPTIME-complete, 
cf. Theorem 1.7.5 of [5]. It follows that "universality of union", i.e., testing whether 
L(Ai)U - • •UL(A„) = Te is EXPTIME-complete. This is because a DTA can easily be 
complemented in polynomial time (first complete the DTA by adding, for any missing 
transition, a transition to a new "sink" state; second, change F into Q — F). We now 
reduce universality of union to RITRC. Let A be any fixed DTA that recognizes Ts 
and let / e S'^^-'. The set of terms with regular constraints (5, M), where 

S = {f{f{x,x),y),f{x[,xi), . . .,f{x'„,Xn)} 
M ^ {xi Ai, . . . ,Xn An,x ^ A,y ^ A, 
x[>^ A,...,x'n^ A}, 

is regular if and only if Ui<i<„L(Ai) = T^. To see this, consider first the case 
where Ui<,<„ L(AO = Ts. Then L((5,M)) = L(({/(xi, ...,/«, x„)}, M}) = 
{/(s, t) \ s,t E Ts}, which is regular. In the other case, let t be in Ts — Ui<i<n L(Ai). 
Intersect L{{S,M)) with the regular set {/(s,t) | s e T^}. Since regular term lan- 
guages are closed under intersection, the resulting set would be regular, if L{{S, M)) 



REGULARITY OF TERMS WITH REGULAR CONSTRAINTS 



5 



was; but, the resulting intersection is {f{f{t',t'),t) \ t' G 7s}. By standard pumping 
arguments (see, e.g.. Example 1.2.1 of [2 ) this set is not regular. Thus, L{{S,M)) is 
not regular in this case. □ 

Proving that RITRC is in EXPTIME is considerably more complicated. 

3.1. Overview of our algorithm for RITRC. Algorithm in [7J. In [7] 

decidability of RITRC was proved. We first explain the idea of that proof, and 
why it does not give rise to an EXPTIME algorithm. Then we give an overview of 
the algorithm presented in this paper. The following is the basic property used for 
deciding RITRC in [7] (and here). 

Definition 3.2. Let {S,M) be a set of terms with regular constraints. The 
term s S satisfies the infinite-instances property in {S, M) if some variable x has 
multiple occurrences in s, and there exists infinitely many instances ifi{s),if2{s), . . . 
of ({s},M) which are not instances of {S — {s},M) and all of them different on x, 
i.e., (pi{x) ^ ipj{x) for all i^ j. 

In [7] it was shown that the infinite-instances property is decidable and that it 
implies non-regularity of {S, M). To decide RITRC, the algorithm of [7] first looks for 
a term in S with multiple occurrences of some variable x satisfying | L(M(a;))| = oo. 
If no such term exists, then it stops concluding regularity of L((S', A/)) (note that 
in this case L(({s},M)) is regular for each term s in S, and regular sets are closed 
under union). Otherwise, it checks the infinite- instances property of s in {S,M). In 
the affirmative case, it stops concluding non-regularity of L((S', M)). In the negative 
case, there are only a finite number of possible instantiations ip{x) of each duplicated 
variable a; in s providing a term in L(({s}, M)) and not in L((S' — {s}, M)). Thus, by 
replacing s by a finite number {si, . . . , s^} of instantiations of s, the represented lan- 
guage L((5', M)) is preserved, and we obtain less duplicated variables. The algorithm 
in [7] decides regularity of L((5, M)) by iterating this process. 

Estimating the complexity. To determine the complexity of the previous algo- 
rithm, we need to know how large is the number k of instantiations of s, how large the 
terms si, . . . , Sfe are, and, of course, how expensive it is to decide the infinite instances 
property. In [7], the latter is solved through a result of [3| about first-order formulas 
with regular constraints. The precise complexity of this result of is not known, but 
it is expected to be higher than that of solving the infinite-instances property, since 
it solves a more general problem. We therefore devise our own algorithm for checking 
this property. But, also the sum of sizes of the terms si, . . . , Sfc poses a problem, as it 
can grow iterated exponential, so the algorithm in [J is certainly not in EXPTIME. 
One of the ideas of our new algorithm is hence not to replace s by {si, . . . , Sfc}. In- 
stead, we are able to find a "small" number h (which depends on S and M) such 
that all terms Si are guaranteed to be of height smaller than h. To take advantage of 
this fact, we add a new kind of constraint to (5, M) which allows duplicated variables 
X of s to be replaced only by "small" terms. The algorithm then continues on with 
this new system (called restricted regular constraints, see Definition 13. 7p , which has 
regular constraints plus height constraints on the variables. 

Infinite-instances algorithm. How do we check the infinite-instances property 
of s in {S,M)1 In Sections 13.41 13. 5i and 13.61 we give an algorithm that solves this 
problem under several assumptions. To begin with, we require that the term s is de- 
termined (see Definition 13.111 for the precise notion) in all the non- variable positions 
of terms in S. We also assume that the regular constraint R is given by a single DTA 
A (instead of the multiple ones in the image of M), and a mapping that associates 
variables with states of A. Finally, we require this DTA A to satisfy the l-or-|S'| prop- 
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erty of Definition 13.51 which says that for any state q of A, the cardinahty of L{A, q) 
is either 1, or it is greater than or equal to \S\. The reason for these assumptions is 
as foUows. In order to decide the infinite-instances property, we compute a formula 
T whose solutions are the instances in L(({s}.i?)) that are not in L{{S — {s},R)). 
This formula is a disjunction of conjunctions of inequalities, where each conjunction 
has at most \S\ — 1 inequalities. After some transformations on J- by means of a 
system of inference rules, the variables x with an associated state qx of A satisfying 
|L(j4, (7a;)| — 1 disappear. Thanks to the l-or-jS*! property, the remaining variables 
in have at least \S\ possible instantiations. This fact is used to show that, for any 
surviving conjunction in there is a variable instantiation that makes true the at 
most 15*1 — 1 inequalities it is composed of, and variables with infinite language have 
infinite choices. Hence, we obtain that s satisfies the infinite-instances property in 
{S, R) if the transformed formula J- is not empty. 

Overview of the algorithm. We give an outline of the EXPTIME algorithm 
that solves RITRC for a given instance {Si, Mi). First of all, we transform {Si, Mi) 
into {S2,Ri) by preserving the represented language, where Ri is a single regular 
constraint (Definition [331), and 5*2 is the adaptation of 5*1 from Mi to Ri. Intuitively, 
{S2, Ri) is the same problem stated with a single l-or-|S'i| DTA; the sizes of both 5*2 
and Ri can be exponential with respect to the sizes of Si and Mi . This transformation 
is described in Section 13.21 The single regular constraint Ri is then converted to a 
restricted regular constraint R2, the new type of constraint, which we introduce in 
Section [3731 that takes account of height restrictions. 

The algorithm then proceeds as follows. At each step it picks a term s of 5*2 
without height constraints, and with multiple occurrences of some variable x satisfying 
I L(A, C(a;))| = 00. If no term of this kind exists, then it stops concluding regularity of 
L((S'2, i?2))- Otherwise, it chooses a term s satisfying the above conditions, and checks 
the infinite-instances property of s with respect to (5*2, i?2)- To do so, the algorithm 
loops over all possible partial instantiations si of s in the non- variable positions of 5*2 , 
and for each s;, it finds a subset S'3 C 5*2, with \S^\ — \Si\ — 1, such that Si has the 
infinite- instances property for S2 if and only if it has the property for S'3. The fact 
that I S'3 1 is small allows to check the infinite-instances property in exponential time. 
In the affirmative case the algorithm stops concluding non-regularity of L((S2,i?2))- 
If no determination Si satisfies the infinite-instances property, the restricted regular 
constraint R2 is modified so as to impose height constraints on the variables of s 
with multiple occurrences. Since the number of terms with duplicated variables and 
without height constraints decreases, the iteration of this process decides regularity 
of L((S'i, Afi)). A careful analysis of all the steps involved will show that the time 
complexity is exponential. 

3.2. SimpHfication to a single DTA. Recall from the preliminaries that we 
assume E to be a fixed but arbitrary signature containing no symbol of arity greater 
than 2. We start with a set of terms with regular constraints {Si, Mi) over a finite 
set of variables V . Recall that S C T^{V) is a finite set of terms and M is a function 
that maps each x io a, DTA over S. We now adapt this definition to a setting 
with only one single DTA A, and where variables in V are now mapped to states in 
A. Moreover, we do not need accepting states anymore and simply drop them from 
A's definition (a "DTA without accepting states"). 

Definition 3.3. A single regular constraint ( over V and Yj) is a pair R = {A, C), 
where A — {Q, S, S) is a complete DTA without accepting states and C is a mapping 
C -.V ^ Q. The size ||i?|| of R is A solution of R is a mapping if : V ^ T^ 
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such that, for each x V , it holds that ip{x) G L(yl, C{x)). A set of terms with single 
regular constraints (over V and "E) is a pair {S,R), where S is a finite subset of 
T^iV) and R = (A^C) is a single regular constraint over V and E. The language 
L((S', R)) of {S, R) is defined as {t \ 3ip, s : (t = (p{s) As^SAcpisa solution of R)}. 
A term in h{{S,R)) is also called an instance of {S,R). 

Transforming a set of terms with regular constraints (5*1, Afi = {xi i—^ 
Ai, . . . , Xn <—>■ An}) into a set of terms with single regular constraints (5*2, Ri) satis- 
fying L((52, i?i}) = L((S'i, Afi}) is rather easy by considering the product automaton 
A — Ai X ■ ■ ■ X An- But the size of (52, can be exponential in the size of (5*1, Mi). 
Moreover, it follows from Proposition l2.2l that regularity of L((S'2, ^i)) is at least NP- 
hard. Hence, it is not enough to have an EXPSPACE-reduction from one problem to 
the other if we want to obtain an EXPTIME algorithm for the initial problem. 

Thus, in the translation from (Si, Mi) into (52,i?i) we keep in mind some ad- 
ditional properties obtained by the transformation process. For instance, the terms 
in ^2 are very similar to those in Si because they are obtained through variable re- 
namings; we call this "structural similarity" . Moreover, as mentioned in the outline 
of Section [37T| we want the DTA A to have the "1-or-n" property, with n = \Si\. We 
proceed to define both properties. 

Definition 3.4. Let V,V' be sets of variables. A total function p : V ^ V is a 
variable renaming if it is injective, i.e., p{x) ^ p{y) for x ^ y. For a term s, p[s) is 
the term obtained from s by replacing in s each variable x £ V by p{x). Two terms s 
and t are structurally similar, denoted by s —y: t, if t ~ p{s) for a variable renaming 
p. For a set of terms S, StructDiff (S*) is the maximum number of non- structurally 
similar terms in S, i.e., StructDiff(S') = niax5/csA(s,teS'^s#i;t)l'S"|- Given a single 
regular constraint R — {A, C) we say that two terms s and t are structurally equal 
('with respect to R) if they are structurally similar, and C{s[p\) = C{t[p\) for all 
p e Posv(s). 

Note that if s and t are structurally equal with respect to R, then L(({s}, R)) = 
L{{{t}, R)); the converse does not necessarily hold. 

Definition 3.5. Let A = {Q,T,,S) be a DTA. Let n be a natural number. We 
say that A is a 1-or-n DTA if each state q in Q satisfies either \ Li(A,q)\ = 1 or 
\L{A,q)\ >n. 

Lemma 3.6. Let {S,M) be a set of terms with regular constraints. Then, {S,M) 
can be transformed in exponential time into a set of terms with single regular con- 
straints {S',R) such that L((S",i?)) = L((S', M)) and the following properties hold. 

• R^ {A, C) satisfies that A is a l-or-\S\ DTA. 

• A — {Q,Y,,5) is complete and satisfies that \Q\ < ||Af||l*^l • \S\ and \S\ < 

\n-\Q\' = \n-\\Mr\^\-\s\^ 

• l^'l < 1^1 • < ||Af||l*^l' • 

• Each term in S' is structurally similar to some term in S . In particular, 
StructDiff(S") < IS"]. 

• Every two distinct terms s,t G S" are not structurally equal with respect to R. 

• Each two distinct terms s,t Cz S' do not share variables. 

Proof. Let M = {xi Ai, . . . ,Xn An} and Ai = {Qi,Fi, E, Si) for 1 < i < n. 
We first complete each DTA A^ to a new DTA A'^ = {Q[,F.i, E, (5-) by adding a sink 
state and all undefined transitions to it. Recall the assumption that the maximum 
arity of S is 2. Thus, \Q'^ ^\Q^\ + l and \5'^ = \T.\ ■ \Q'^\^ = |E| • (|Q,| + if. We now 
construct the product automaton (without accepting states) A' = {Q',T,,S'), i.e., we 
set Q' ~ Q'lX- ■ • xQjj and if, for each 1 < i < n, S'i has the transition f{qi^i, . . . , qi^k) 
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qi, then we add the transition f{{qiA, QnA), {qi,k, qn,k)) ~* (^i; • ■ • , 9n) to 
5' . Since each state of A' is a tuple of \M\ states of the automata in A4 plus a sink 
state, \Q'\ < ||M||I^^I. 

We then transform A' into a l-or-jS*! DTA. To this end, we compute the mapping 
M' -.Q ^ {!,..., IS*!} with M' ^ countUpto(AM5|) = {(q, min(| L(A, |5|)) | q G 
Q'}, according to Lemma [2.11 Now, using M' we obtain the desired A as output of 
the following algorithm. 

Input: A' = (Q', E, 5') and M' : Q' ~* {I, . . . , \S\}. 
Q:={q I ge Q' AM'(g) = l^l} U 

{q I qeQ' M<i< M'{q) < \S\}. 
5 ■- 0. 

For each g in Q' do: 
If M'(g) = |S| then: 

For each /(gi, . . . ,qm) q in 5' do: 

For each ii , . . . , im with gj^ . . . , g^ £ Q do: 
Add f{<f,\...,q^^)^q^ to 5. 

else: 

Let Zi g, . . . , Zfc g be all transitions of 5' 

with g as right-hand side. 
counter:=l. 

For each i in {1, . . . ,k} do: 

Let /(gi, . . . ,gm) — > g be — > g. 

For each ii, . . . ,im with gj^ , . . . , g^" G Q do: 

Add/(g;\...,g;r)^g~''to s. 

counter++. 
Complete yl = (Q, E, 5) and return the result. 

It is clear that this algorithm generates a complete l-or-|5| DTA A with \Q\ < 
||M||I*'^I-|S'|, because at most |5| new states are created for every state in Q'. Moreover, 
since the maximum arity of S is 2, then at most \S\ — transitions are possible 

with such number of states. The construction runs in exponential time because A' 
is constructed in exponential time, M' is constructed in time polynomial in \A'\ by 
Lemma [2Tl and A is constructed in time OdAj). 

Now, the set S' is obtained in the following way. Recall that the states q in Q are 
in fact of the form q — (gi, . . . , , i.e., are tuples of states qi G Q[, . . . ,qn € Q'^ plus 
an index j satisfying I < j < M' {{qi, . . . , g„)). For each variable Xi in the domain of 
M, we define the set of variables V{xi) = {^\q^ q )j \ qi ^ Pi ^ (^ij • ■ • ) G Q}- 
We define the domain V of the mapping C as UiG{i n}(^('^«))' ^'^'^ image of 
each cc^ by C as q. Finally, let Q be the set of substitutions (p over {xi, . . . ,Xn} 
satisfying (p{xi) £ V{xi). We compute 5" as a minimal set satisfying that each one of 
its terms is structurally equal to some term in {ipis) \ s £ S A (p € Q}, and vice-versa 
(i.e. S' is computed from {ip{s) \ s E S A (p E 0} by removing repetitions modulo 
structural equality). Moreover, we force the terms in S' to do not share variables, 
by renaming them in S", and defining them in V and C whenever it is necessary. 
Obviously, each term in S' is structurally similar to some term in 5*, and any two 
distinct terms in 5" are not structurally equal. Each V{xi) has at most \Q\ variables. 
Thus, 9 has at most substitutions, and hence < l^l • IQl'^^L Generating 

S" consists of considering all of such combinations of a term in S and a substitution 
in O. Thus, the time complexity for creating S' from S and A is proportional to its 
size. I.e., IS m Odl^ll • IQI'^^I). In total, (S",i? = {A,C)) is constructed in exponential 
time w.r.t. 11511 + ||Af||. □ 
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3.3. Adding height constraints. Let (52, -Ri) by the set of terms with single 
regular constraints that was obtained from {Si, Mi) according to Lemma 13.61 Our 
algorithm proceeds by considering a term s in 52, and analyzing the kind of instances 
which are in L(({s},i?i}) but not in L((52 — {s},Ri)). Depending on this analysis, 
it either concludes non-regularity of L((52,i?i}), or deduces that the height of the 
substitutions for some variables of s can be bounded by \Q\ + 2H, where H is the 
maximum height of the terms in 5i. To manage this height constraint, we extend the 
notion of single regular constraint as follows. 

Definition 3.7. A restricted regular constraint (over T,) is a tuple R = 
{A,V,C,W,h), where W <^ V are sets of variables, A = (Q^E^S) is a DTA, C 
is a mapping C : V ^ Q, and h is a natural number. The size ||i?|| of R is 
|y I + II j4|| . A solution of R is a mapping ip : V Ts such that for all x it holds 
f{x) G L(j4, C(a;)), and moreover, if x ^ W then height(iy9(a;)) < h. For a finite set 
S C Ty,{V), the pair (5, R) is a set of terms with restricted regular constraints. The 
language L((5, R)) of (5, R) is {t \ 3(^, s : (t = ip{s) A s Cz S A ip is a solution of R)}. 
A term in L((5, i?)) is also called an instance of (5, i?). 

Obviously, the set of terms with single regular constraints (52,i?i = {A,C)) can 
be transformed into the set of terms with restricted regular constraints (52 , i?2 = 
{A, V, C, 0, IQI -I- 2H)), and the represented language is preserved, i.e. L((52, = 
L((52,i?2)). For a restricted regular constraint (5, i?), we can define the infinite- 
instances property analogously to Definition 13.21 where it is defined for a set of terms 
with regular constraints. As mentioned before, when a term in 5 satisfies the infinite- 
instances property, then L((5, Af)) is not regular [7 . Exactly the same thing, with 
the same proof, can be said about a set of terms with restricted regular constraints 
(5,i?). 

Lemma 3.8. Let (5, i?) be a set of terms with restricted regular constraints. Let 
s be a term satisfying the infinite-instances property in (5, i?). Then, L((5, i?)) is not 
regular. 

In order to make the paper self-contained, we prove this result. The proof is 
simplified and adapted to the case of restricted regular constraints. 

Proof. We prove the lemma by contradiction, i.e. we assume that there exists 
DTA B = {Qb,T,6b, Fb) recognizing L((5, R)) in order to reach a contradiction. 

By the assumptions, there exists a variable x with more than one occurrence 
in s, and infinite instances ipi{s) , ip2{s) , . . . of {{s},R) which are not instances of 
(5 — {s}, R), and satisfying ipi{x) ^ fj{x) for all j > i >l. 

Let R be (A, V, C, W, h), let A be {Qa, T, Sa), and let H be the maximum height 
of the terms in 5. Let pi be one of the positions in s where x occurs. 

Since the instances (/?i(s) are not in (5 — {s}, R) and are different on x, there is 
a solution (p {(p = ipi for some i > \) of R satisfying that tf{s) is not an instance 
of (5 — {s},i?) and height((^(a;)) > H + h + \Qa\ ■ \Qb\- Let p2 be a position such 
that Pi.p2 is a position of ip{s), \pi.p2\ ~ H + h and height((^(s)/(pi.p2)) > \Qa\ ■ 
\Qb\- By a simple pumping argument, there exist positions p^ and p^ satisfying that 
P1-P2-P3-P4 is a position of ip{s), \p4\ > 1, A{(p{s)/{pi.p2.p3-P4)) = A{Lp(s)/{pi.p2.p3)) 
and B{(p{s)/{pi.p2.p3-P4)) = B{Lp{s) / {p1.p2.p3)). 

Let H be height ((/^(s)). Let D be the context {{p{s) / {pi.p2.p3))[p4 ^ •]. We 
consider the term t = ip{s)[p'i <— [{p{s) / p4\]. Note that t is accepted by B. Thus, 
in order to reach a contradiction, it suffices to see that t is not an instance of (5, R). It 
is clearly not an instance of (s, R), since we have the term (^(s)[p3 ^ [tf{s) / p^^ / pi 
as a subterm in i at a position of x in s, and the term (p(s)/pi as a subterm in t at 
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another position of x in s. Thus, it rests to see that t is not an instance of ({s'}, R) 
for each s' in S — {s}. 

For each term s' in 5* — {s}, we know that the term 1^9(5) is not an instance of 
{{s'}, R), and this has to be due to one of the foUowing reasons: 

(a) There is a position q in PoSnvls') satisfying that q is not in Pos((/9(s)), 

(b) There is a position q in Pos((p(s)) n PoSnv(s') satisfying ip{s)[q] ^ s'[q], 

(c) There is a position q in Pos((;9(s)) nPoSv(s') satisfying A{(f{s)/q) ^ C(s'[(7]), 

(d) There are positions q and q' in Pos((y9(s)) n PoSv(s') satisfying s'\q\ = s'\cl\ 
and .^(s)l, ^ ^{s)\q'. 

(e) There is a position q in Pos((y5(s)) fl PoSv(s') satisfying s'\q\ e W and 
height ((/3(s)) > h. 

In cases (a), (b), (c) and (e) it is straightforward that t is not an instance of 
({s'},i?) by the same reason. Thus, assume we are in case (d). If both q and 
are disjoint with pi.p2, then t/q = ip{s)/q ^ ip{s)/q' — t/q', and hence, t is not an 
instance of ({s'},i?). If one of q or q', say q, is a prefix of pi-P2, then, t/q ^ t/q' 
also holds, because height(i/q) > H > height(i/g'). Therefore, t is not an instance of 
({s'}, R) in any case, and this concludes the proof. □ 

For the particular case of a singleton S — {s}, Lemma [3.81 implies the following 
statement. 

Corollary 3.9. Let ({s}, R) be a set of terms with restricted regular constraints. 
Then, L(({s},i?)) is regular if and only if for each variable x occurring at least twice 
in s, either \ Ij{A, C{x))\ ^00 or x G W. 

The previous corollary naturally leads to the following definition of regular term. 

Definition 3.10. Let R = {A,V,C,W,h) be a restricted regular constraint. A 
term s G T-^iV) is regular with respect to R if for each variable x occurring at least 
twice in s, either |L(^, C(a;))| ^00 or x G W. 

3.4. Determining a term. At this point, we want to test whether a term s 
satisfies the infinite-instances property with respect to S2, that is, we want to analyze 
the instances of ({s},i?2) which are not instances of {S2 — {s},i?2)- To make this 
problem easier, it would be good to have s determined at all non-variable positions 
of the terms in S2, according to the following definition. 

Definition 3.11. For a position p and a term s G Ty.{V), we say that s is 
determined at p if either p G PoSnv(s) or there is a prefix p' of p such that s[p'] is a 
constant symbol, i.e., it is in The term s is determined at a set of positions P 

if it is determined at each p G P. 

One of the nice (and obvious) properties of determined positions p of s is that, for 
any substitution ip mapping variables to terms, the symbol ip{s)[p] is either undefined 
or coincides with s[p]. 

Lemma 3.12. Let p be a position and s a term determined at p. Let (pi,ip2 be 
mappings from variables to T^. Either p is not a position of both ^pi{s) and Lp2{s), or 

tfl{s)[p] = if2{s)[p\. 

Proof. No prefix p' oip is such that s[p'] is a variable. Hence, for every substitution 
Lp, we have that ^{s) is undefined at p if so was s, or that Lp{s)\p] = s[p] if not. □ 

Another nice property of determined positions is that, given a term s, a restricted 
regular constraint R, and a set of positions P, then, a set of terms si, . . . , Sfc, all of 
them determined at P, can be generated in exponential time on | Prefixes(P)|, such 
that {s} and {si, . . . , Sfc} represent the same language. The idea of determining a 
term at a set of positions was already used in [7] . 

Lemma 3.13. Let R — {A,V,C,W^h) be a restricted regular constraint, where 
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A = {Q,T,,6). Let s be a term in Ts(F — W) and let P be a set of positions. 
It can be computed in time 0{\s\ ■ | Prefixes(P)| • an extension R' = 

{A, V, C, W, h) of R and a set of terms {si, . . . , Sk] in Ts(y) satisfying the following 
properties. 

• Si, . . . , Sfe are determined at P. 

. L{{{si,...,Sk},R'))=m{s},R)). 

• Each Si can be obtained from s through a substitution which replaces each 
variable by a term with height bounded by the maximum length of a position 
in P. 

• k< |J||Prefixes(P)|^ 

• For each i in {1, . . . , fc}, \si\ is bounded by Z ■ \ Prcfixos(P)| • |,s|. 

Proof. We start with {S[,R'i) = ({s},i?) and transform it iteratively, while 
preserving the represented language, into new pairs (S'2, i?2)i ■ ■ ■ ^ i^'f^ ^/); where we 
denote S'j: = {si, . . . , Sk} and i?^ = R' . Let s' be a term of which is not determined 
at somep € Prefixes(P)nPoSv(s'), and let R'^ = {A,Vi,Ci, W, h) be the i-th constraint. 
Let y = s'\p] and let q = Ci{y). The DTA A has a finite number of transitions of 
the form g{qi, ■ ■ ■ ,(lm) ~* <?: where q,qi, . . . ,qm e Q and g G E^") for m < 2, 
by the assumption on S. For each such transition, we construct the substitution 
lg,q-i,...,qrn,q = [?/ ^ Qi^ii ■ • ■ J ^m)] where zi, . . . , Zm are new variables. Let VIj^-^ be the 
union of these new sets of variables for all such transitions. Let C[j^-^ be the union of 
all sets {{z\,q{), . . . , {zm, 9m)} for all such transitions. We set 14+1 := U and 
Ci+i := C^+iUCi. Finally, we set S'^^^ := {S'^- {s'})US" , where S" is the set of terms 
obtained by applying all the substitutions 7g,gi,...,g^,g to s'. Clearly, L{{Sl_^_i, R^_^_i)) 
coincides with L{{Si,Ri)). 

At each of the | Prefixes(P)| positions we apply at most \6\ different substitutions 
giving us at most P''^fi^^^(^)l-many different terms. Thus, k < P>-efixes(p)|_ ^^^^^ 
substitution lg,qi,...,qm,q iucreases the size of a term s' by the arity m of g, which 
is at most 2, and the variable replaced has at most \s\ occurrences in s' . Thus, 
\s\ + I Prefixes(P)| + 2 ■ | Prefixes(P)| • |s| < 3 ■ | Prefixes(P)| • \s\ bounds the size of 
each Si. 

Note that only those variables y which appear at some position p € P may be 
replaced by some jg^q-i^...^q^.q = [ij ^ g{zi, . . . , Zm)]- Since the now variables Zj always 
appear one position deeper than the variable y they substitute, it follows that, in the 
process described, no variable of s can be replaced by a term of height larger than 
the maximum length of the positions in P. □ 

3.5. Structurally subsumed terms. Let s be a term determined at all the 
non- variable positions of the terms in S2. In order to check the infinite-instances 
property, our goal is to characterize the set L(({s}, R2)) — L((S'2 — {s}, i?2))- Recall 
that StructDiff (5'2) is bounded by the initial |S'i|. This can be used to discard many 
terms in ^2 having no common instance with ({s},i?2). To this end, we introduce 
the following notions. Let A be a DTA and C a mapping from variables to states 
of A. For a term s e Te(F), we define C{s) := A{s[x ^ C{x) \ x G V]). If C is 
clear from the context, we denote a term s by for q = C{s), or as /^(si, . . . , Sm) if 

S = /(Sl, . . . , Sjn). 

Definition 3.14. Let R = {A,V,C,W,h) be a restricted regular constraint over 
S, a,nd let s,t G T^{V). We say that s is structurally subsumed by t (with respect to 
R), if for all p in PoSnv(i) it holds that p is in Pos(s) and t[p] = s\p\, and moreover, 
for allp in Vos{t) it holds thai C{t/p) = C{s/p). 

Which terms in L((S'2 — {s}, R2)) can possibly have common instances with si If 
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t structurally subsumes s, then they potentially have common instances (this depends 
on the equality constraints imposed by duplicated variables in s and t). For instance, 
t = f{x,x) structurally subsumes s = f{a,b) if C{x) = C{a) = C{h), but obviously s 
and t do not have common instances. What happens if t does not structurally subsume 
s? Does this imply that s and t do not have common instances? Unfortunately not: 
t = f{x,a) does not structurally subsume s = f{a,y), but if C(jj) = C'{x) and 
a G L{A, C{x)) then ({t}, R) and ({s}, R) share /(a, o) as instance. At this point, the 
benefits of determining a term come into play. 

Lemma 3.15. Let R — {A,V,C,W,h) be a restricted regular constraint over E 
and let s,t G T^(V). If s is determined at PoSnv(i) md s is not structurally subsumed 
by t, then Ij({{s} , R)) and Ij{{{t} , R)) are disjoint. 

Proof. With the conditions of the lemma, and according to Dcfinition l3.141 either 
it exists a position p S PoSnv(i) Q PoSnv(s) such that t[p\ ^ s[p\, or it exists a position 
p e Pos(t) C Pos(s) such that C{t/p) ^ C{s/p). In the former case it is clear that 
all instances of {{s},R) and {{t},R) differ at p; in the latter case, the result follows 
from the fact that L{A, q) and L{A, q') are disjoint \i q^ q' . □ 

Moreover, when two terms are structurally similar but not structurally equal, 
they cannot both structurally subsume a third term. 

Lemma 3.16. Let R = {A, V, C, W, h) be a restricted regular constraint over E and 
s,ti,t2 G T^{V). Assume that ti and t2 are structurally similar but not structurally 
equal, and that s is structurally subsumed by ti. Then s is not structurally subsumed 
by t2. 

Proof. If two terms ti and t2 are structurally similar but not structurally equal, 
then there is a position p e PoSv(ti) = PoSv(t2) such that C(ti[p]) ^ C{t2[p\). Now 
ti structurally subsumes s, so C(ti[p]) = C(s[p]); this prevents t2 from subsuming s. 
□ 

Recall that, by Lemma we can choose at most IS"!] non-structurally similar 
terms in 5*2. This fact, combined with Lemma I3.16i implies that at most l^il — 1 
terms in 5*2 — {s} structurally subsume s. Since, by assumption, s is determined at 
all non- variable positions of terms in 5*2, then, by Lemma 13.151 only those \Si \ — 1 
terms may have common instances with s. Thus, when analyzing the instances of 
({s}, R2) which are not instances of {S2 — {s}, -R2), we can first choose the subset 5*3 
of terms in 5*2 — {s} which structurally subsume s (because they are the only possible 
ones to have common instances with s), and study which instances of ({s},i?2) are 
not instances of (S'3,i?2). Note that \S^\ < \Si\ — 1. 

As mentioned before, if t structurally subsumes s, then whether they have com- 
mon instances or not, depends on the equality constraints imposed by duplicated 
variables. Since our restricted regular constraints also require that ip{x) < h for 
X € W, it means that (p{s) can only be an instance of t if the height of (p{s)/p is 
smaller than or equal to h whenever t[p] e W. 

Lemma 3.17. Let R = {A,V,C,W,h) be a restricted regular constraint. Let 
s and t be terms such that s is structurally subsumed by t with respect to R, and 
Vars(s) C\W = %. Let >p{s) be an instance of ({s},i?}. Then ip{s) is an instance of 
({t}, R) if and only if 

• for all p,q in PoSv(t) such that t[p] = t[q] it holds ip{s)/p ~ Lp{s)/q, and 

• for all p in PoSv(i) such that t[p\ €W it holds height (ip (s) / p) < h. 

Proof. Since s is structurally subsumed by t, an instance f'(t) of t coincides 
with ip{s) if and only if (p'(t)/p — ip{s)/p for every p in PoSv(i). But this condition 
uniquely determines tp' , i.e. >f{s) is an instance >f'{t) of t if and only if (p' is defined as 
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ip'(t[p\) :— (p(s)/p, for every p in PoSv(t). This definition of ip' is correct (i.e. uniquely 
defined for each variable) if and only if for all p,q in PoSv(i) such that t[p] = t[q] it 
holds ip{s)/p = (p(s)/q. Thus, the first item is necessarily satisfied. Moreover, by the 
assumptions of the lemma, for all p G Pos(t) it holds C{t/p) — C{s/p). Thus, the 
instance f'{t) of t is also an instance of (t, -R) if and only if for all p e PoSv(i) such 
that t[p] e VF, it holds height(i^(s)/p) < /i, as required by the second item of the 
lemma. □ 

3.6. Formulas representing instances. By using Lemma [3.171 we are able to 
characterize the instances of ({s}, R2) which are not instances of {S^, R2) as the solu- 
tions of a formula F which is a disjunction of conjunctions with inequalities between 
terms and height restrictions of terms as predicates, and a single regular constraint 
for the variables. 

Definition 3.18. Let V be a finite set 0/ variables. A formula with inequality 
and height predicates F ( over V ) is a disjunction of conjunctions of predicates of 
the form s ^ t and height(s) > h, where s,t G T's(y) and h is a natural number. A 
constrained formula of order n is a triple {F^ A, C) , where A = (Q, S, i5) is a l-or-{n-\- 
1) DTA, F is a formula with inequality and height predicates where every conjunction 
has at most n predicates, and C is a total function C : V ^ Q. Moreover, for each 
predicate height(s) > h we require that h is greater than or equal to \Q\ -l-height(s). A 
solution of {F, A, C) is a substitution ip : V T^: such that A{ip(x)) = C{x) for each 
X V and ip{F) evaluates to true by interpreting ^, height, and > in the natural 
way. The set of all solutions is denoted Sol((F, A, C)). 

We now construct a constrained formula for a given set of terms S and term s. 
Denote by selPos(5') the set of functions P : S ^ Us'gs(PoSv(s')) such that for each 
s' G S, Pis') G Posv(s')- Let be a set of variables. We define selPos(S', VF) as the 
subset of functions P of selPos(S') such that for each s' G S, s'[P{s')] G W. 

Definition 3.19. Let S be a set of terms, and let R = {A,V,C,W,h) be a 
restricted regular constraint, where A = ((3,I],(5} is a l-or-{\S\ + 1) DTA. Finally, 
let s G T^{V — W) be a term that is structurally subsumed by all terms in S with 
respect to R. Also suppose that h is greater than or equal to \Q\ + height(s). We 
define J-{s, S, W, h) as 

V ( A ^/^(*) ^ ^/^(*) A height(s/T(i)) > h\ 

where a says that S' C S; P,U £ selPos(5') such that for every s' G S' : P(s') ^ U{s') 
and s'[P{s')] = s'[U{s')]; and T G selPos(5 - S',W). Note that T{s,S,W,h) is 
a formula with inequality and height predicates and that {J-{s, S,W,h), A,C) is a 
constrained formula of order \S\. 

According to Lemma |3.17[ the instances of ({s},i?) that are not instances of 
{S, R) are precisely the terms {p{s) with (p G So\{{!F{s, S, W, h),A, C)). We state this 
in the following lemma. 

Lemma 3.20. Let R = {A,V,C,W,h) be a restricted regular constraint, where 
A is {Q,'E,,6). Let S be a set of terms and let s G Ts{V — W) be a term that is 
.structurally subsumed by all terms in S, and .such that h > \Q\ + height(s). Then, 
L{{{s},R)) ~L{{S,R)) = Sol{{T{s,S,W,h),A,C)). 

Our goal is to decide whether ({s}, R2) has infinitely many instances, all of them 
different on a certain variable x, and all of them not instances of (53,i?2}. We do 
not solve this problem for an arbitrary restricted regular constraint R. Recall that 
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ISsl < |S'i|-l, i?2 is of the form {A, V, C,W, \Q\+2H)) where iJ is the maximum height 
of a term in 5*1, height(s) < 2H and A is a l-or-l^il DTA. Our problem now translates 
to the constrained formula {T{s, S^^W, \Q\ + 2H),A,C), which is of order IS^] due 
to the particularities of R2; i.e., we need to decide whether {^{s, S3, W, h),A, C) has 
infinite solutions and all of them different on a concrete variable x. To this end 
we proceed by transforming this formula by means of the set of rules described in 
Figure 13.11 The following lemma states that the inference system preserves the set of 
solutions. 



Remove- insat 1 : 

Remove-insat2 : 

where |L(A, g)| = 1. 

Remove-sat 1: 



C V ft ^ t A J) 
C 

C V (g-? AD) 
C 

C V {si ^ ti' A D) 



cy{D) 

where either q ^ q' , ov s[e\, t[e\ are not variables and 
s[e]^t[e]. 

Cy (xi AD) 
Remove-sat2: „ , , — 

O V \U j 

where t is not x and x e Vars(i). 



Decompose: 



, . , , gV(height(/ngi,...,gr»)) >fe ^D) 
Decrease-height: „ , , , , r^ — ■ i . / — \ ; -, . „\ 

C'VV.e{i,...™}(height(sO >h-lAD) 

where L(yl, g) is infinite, and h > \Q\. 

CV (height (s9) > h A D) 



Remove- height: 

where L(^, g) is finite, and h > \Q\. 

Fig. 3.1. Inference rules for transforming formulas into final formulas. 

Lemma 3.21. // {F,A,C) is a constrained formula of order n and {F,A,C) 
derives into (G, A, C) by the application of one inference rule of Figure lg.il then 
(G, A, C) is a constrained formula of order n, and Sol((G, A,C)) = Sol((-F', A,C)). 

Proof. It is clear that (G, A, C) is also a constrained formula of the same order 
than {F, A, C): the DTA does not change; the number of predicates in a conjunction 
is never increased; and the only rule that adds a new height predicate, i.e., rule 
Decrease-height, reduces both by one the height of the left side term and the right 
side bound. 
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To see that the solutions are preserved under the appUcations of the inference 
rules, we only need to observe that rules Remove-insatl, Remove- ins at2^ and Remove- 
height simply remove conjunctions that are impossible to satisfy; that rules Remove- 
satl and Remove-sat2 remove inequality statements from inside a conjunction which 
are always satisfied; and that rules Decompose and Decrease-height decompose a state- 
ment into an equivalent disjunction of statements. □ 

Definition 3.22. A constrained formula {F^A,C) is final if no rule can be 
applied on {F,A,C). 

The following lemma characterizes final formulas. 

Lemma 3.23. Let {F,A,C) be a constrained formula of order n. Then, {F,A,C) 
is a final formula of order n if and only if F is a disjunction of conjunctions of the 
form 

{xi^tiA...A ^ A height(2/i) > /ii A . . . A height(2/fe) > h^) 

where k -\- m < n, every Xi is a variable not occurring in the corresponding ti, every 
C{xi) coincides with its corresponding C{ti), every |L(A, C(xi))| > n, and every yi 
is a variable satisfying \ 1^{A, C'{yi))\ — oo. 

Proof. The right-to-left implication trivially follows by inspecting that no infer- 
ence rule can be applied on F. For the left-to-right implication, assume that {F, A, C) 
is a final formula. First, let s ^ t be any inequality predicate of F. The terms s and 
t are different since rule Remove-insatl is not applicable. One of both has to be a 
variable: otherwise, one of Remove-satl or Decompose is applicable. Without loss of 
generality, let s be a variable x. Then, x cannot occur in t: otherwise, rule Remove- 
sat2 is applicable (recall that x — s and t are different). The states C{x) and C{t) 
coincide: otherwise, rule Remove-satl is applicable. Moreover, |L(A, C(a;))| > n: 
since A is a l-or-(n + 1) DTA, | L{A, C{s))\ is either 1 or greater than n, but it can- 
not be 1 because, otherwise, rule Remove-insat2 would be applicable. Second, let 
height(it) > \Q\ + h be any height predicate of F. The cardinality of L{A, C{u)) must 
be infinite: otherwise rule Remove-height is applicable. Moreover, the term u must 
be a variable: otherwise, rule Decrease-height is applicable. □ 

The following lemma proves that any non-empty final formula of order n has a 
solution, and moreover, if some variable x has an infinite language, then there are 
infinitely many solutions all of them different on x. 

Lemma 3.24. Let V be a set of variables. Let {F,A,C) be any non-empty final 
formula (over V ) of order n. Then, {F,A,C) has a solution. Moreover, if x ^ V 
satisfies that \{L(A, C{x))}\ — oo, then there exists infinitely many solutions ipi, if2, ■ ■ ■ 
of {F, A, C) such that all Lpi{x), if2{x), . . . are pairwise different. 

Proof. Note that F is a disjunction of conjunctions V Gi, where each {Gi,A, C) 
is also a non-empty final formula, and So\{{Gi, A,C)) C Sol((F, yl, C}) holds. Thus, 
we assume the simple case where = Gi is a single conjunction of predicates {xi ^ 
<i A . . . A x„ ^ A height(yi) > hi A . . . A height(?/fc) > hk). 

We construct a solution ip of {F, A, C) by first defining (p{x) = t for each variable 
X satisfying that L{A,C{x)) is a singleton language {t} (note that this is the only 
possible election for 1^9(2;) in a solution). Then, we replace all occurrences of a; by i in 
F. By Lemma I3.23[ each occurrence of x must be at a child position of some node 
in a ti. After that, the resulting F satisfies that no variable x with | L{A, C{x))\ = 1 
occurs in F, but the left-hand sides of inequalities are still variables, and each Xi ^ ti 
satisfies that Xi does not occur in ti. 

Now, we complete the definition of ip by applying the process explained below. 
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This process chooses a particular variable x at each step, chooses a particular sub- 
stitution ip{x) for it, and then replaces all occurrences of x in _F by ip{x). Thus, the 
process terminates. Since F is modified along the execution, it can lose the property 
of being a final formula: for example, when an equation x = t occurs and x is instan- 
tiated, it is no longer true that each equation has a variable in one of its sides. The 
election of each ip{x) for each corresponding x is done in a way such that, whenever 
a predicate is made variable-free, then it is trivially true. 

(a) If all inequalities of F with variables contain at least two distinct variables, 
then choose any variable x occurring in them. Choose any variable-free term 
t in L{A,C{x)), also satisfying height(t) > /i if a predicate height(a;) > h 
occurs in F (note that such a t exists since, by Lemma 13.231 the language 
L{A, C{x)) is infinite). Then, define (p{x) :— t. Replace each occurrence of x 
by t in F. Jump to (a). 

(b) If F still contains a variable in some inequality, then choose an inequality 
Si 7^ ti with occurrences of just one variable a;, i.e. satisfying Vars(si) U 
Vars(<i) = {x}. Without loss of generahty, let si ^ ti, ■ ■ ■ , Sm' 7^ tm' be the 
inequalities containing x and no other variable than x. Choose a variable-free 
term t in L{A,C{x)), such that {x i-^ t}{si ^ ti A • ■ • A Sm' ^ im') is true 
(note that this is possible since | L{A, C{x))\ > n > m > to', where n is the 
order of the final formula {F,A,C)), and also satisfying height(i) > if a 
predicate height(x) > h occurs in F (as before, such a t exists since in this 
case L{A, C{x)) is infinite). Replace each occurrence of x by t. Jump to (a). 

(c) For each variable x for which ip is still not defined, choose any term t in 
L{A, C{x)) also satisfying height(f) > ft, if a predicate height(a;) > h occurs 
in F, and define f{x) := t. Replace each occurrence of x by t. 

For the case of variables x with infinite L{A,C{x)), when the process above 
chooses a value for them, it has an infinite number of possibilities. Hence, infinitely 
many solutions (f can be found, all of them distinct on f{x). □ 

Lemma 3.25. Let s G Ty,{V — W) be a term determined at PoSiiv(>S'), where 
{S, R) is a set of terms with restricted regular constraints. Let R be of the form 
(A, y, C, W, h), where A is a l-or-{\S\ + l) DTA and h > | Q] -I- height (s). It is decidable 
in time 0{2^^^ ■ \s\'^^^^^^\S\) whether ({s},i?) has an instance not in L{{S,R)). In the 
affirmative case, if a variable x occurs at least twice in s and it satisfies \ h{A, C{x))\ = 
oo, then s has infinitely many instances not in L((S', R)), and all of them different on 
x. 

Proof. Let F be J-'{s, S, W, h). The constrained formula {F, A, C) of order \S\ can 
be easily constructed in time T — 2l'^l+^ • \S\ ■ |s|^l'^l+^, since it has no more than 
2l'SI . |s|2|'^l conjunctions, each of them with at most |5| statements of size bounded by 
2- \s\. In fact, 2l'^l • |s|^l'^l is also a bound for the total number of different conjunctions 
that may appear along the inference process. Thus, if we treat each conjunction once, 
by removing the generated ones that have been already treated, at most 2l'^l • jspl"^! 
inference steps are executed. Each inference step takes time proportional to the size 
of a conjunction, which is bounded by jS*! ■ 2 • |s|, multiplied by the maximum arity, 
which is 2 by our simplifying assumption. Thus, the total cost is 0{2^^^ ■ |spl'^l+^ jS*!). 
□ 

3.7. The algorithm. We summarize in Figure [H?^ the EXPTIME algorithm for 
deciding regularity of a set of terms with regular constraints. 

The algorithm starts by transforming the input instance {Si, Mi) into an equiv- 
alent set of terms with single regular constraints in exponential time, according to 
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Input: set of terms with regular constraints (Si, Afi) 
Compute set of terms with single regular constraints {S2,Ri)- 
W ■- 0. 

For each non-regular s £ S2 do: 

Determine s at PoSnv(5'2), giving {si, Sk}- 
For each i — 1 to k do: 

Compute 53 := {t £ S2 \ t structurally subsumes Si} — {si}. 
Run infinite-instances {si, S3, A, C, W, \Q\ + 2H): 
{ Build formula T{s,, S3, W, \Q\ + 2H), 
p = Reduce (^), 
Return (p 7^ empty formula). } 
If infinite-instances returns true, then Return ("not-regular"); 
VK := U {a; I a; occurs > 2 times in s}. 
Return ( "regular" ) ; 

Fig. 3.2. The EXPTIME algorithm for deciding regularity. 

Lemma [221 Then, the algorithm (implicitly) considers a restricted regular constraint 
i?2 = {A, V,C,W,\Q\ + 2H), where W — % &i the beginning. The determination of 
s into si,...,Sfe is done according to Lemma 13.131 This determination also takes 
exponential time on the size of the input instance {Si, Mi), since Prefixes(Pos„v(52)) 
coincides with Prefixes(PoSnv(5'i)). Finally, the infinite- instances property can be also 
determined in exponential time, due to Lemma 13.251 and the fact that jS'sj < jS*!! — 1. 
Thus, it follows from the previous lemmas that the algorithm runs in exponential time 
with respect to H^"!!! -h [Mi|. 

Now we discuss the correctness of the algorithm. Let R'2 be the extension of 
i?2 obtained when determining s into si, . . . ,Sk, according to Lemma 13.131 Assume 
the case where none of the Si satisfies the infinite instances property in ^2, and 
consider a concrete term Si satisfying that ({si},i?2) has instances not in L((S'2 — 
{s}, i?2))- By Lemma r3.251 Si cannot have duplicated variables with associated infinite 
language. Now, consider a duplicated variable x with infinite language and occurring 
at a position p in s. By Lemma |3.13[ Si/p has height bounded by H, and occurs at 
another position in s^. Thus, all the variables y occurring in Si/p are duplicated in 
Si, and hence, L(({j/}, i?2)) is finite. In particular, they can be instantiated by a term 
with height bounded by \Q\ in order to get an instance. Therefore, L(({s,;/p}, R^)) is 
finite, and any instance t of L(({si}, -R2)) satisfies height(i/p) < |Q| + H . 

From the above considerations we conclude that all instances t of ({s}, R2) not in 
L(S' — {s}, -R2) satisfy the following statement: for each position p with a duplicated 
variable in s, height(t/p) < \Q\ + H < \Q\ + 2H . Thus, by adding the duplicated 
variables in s to we preserve the represented language. 

Theorem 3.26. The above algorithm solves RITRC in exponential time. 

4. Concluding Remarks. In this contribution we have shown that the RITRC 
problem is EXPTIME-hard, and have presented a new algorithm that solves RITRC 
in exponential time. This problem is a particular case of the HOM problem 5]: given 
a DTA A and a tree homomorphism H, is H{L{A)) regular? The decidability of this 
problem is a long-standing open question. The main problem is how to handle non- 
linearity of H, and to determine in which cases it forces non-regularity of H{L{A)). 
Our algorithm gives some intuition about when non-linearity poses a real problem for 
the regularity of the represented set (it also gives an exponential time solution for the 
HOM problem in the case that non-linear rules are only applied at bounded depth of 
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the input tree, cf. [7 ). But, it is still far from solving the general problem. 
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